Virus Research, 22 (1992) 125-141 125 
© 1992 Elsevier Science Publishers B.V. Ail rights reserved 0168-1702 /92/$05.00 


VIRUS 00728 


Sequence analysis of human coronavirus 229E 
mRNAs 4 and 5: evidence for polymorphism 
and homology with myelin basic protein 


Patricia Jouvenne ', Samir Mounir ', Janet N. Stewart ', 
Christopher D. Richardson ? and Pierre J. Talbot ! 


! Institut Armand-Frappier, Université du Québec, Virology Research Center, Laval, Québec, Canada and 
? Biotechnology Research Center, National Research Council of Canada, Montréal, Québec, Canada 


(Received 1 May 1991; revision received and accepted 31 October 1991) 


Summary 


Human coronaviruses (HCV) are important pathogens responsible for respira- 
tory, gastrointestinal and possibly neurological disorders. To better understand the 
molecular biology of the prototype HCV-229E strain, the nucleotide sequence of 
the 5’-unique regions of mRNAs 4 and 5 were determined from cloned cDNAs. 
Sequence analysis of the cDNAs synthesized from mRNA 4 revealed a major 
difference with previously published results. However, polymerase chain reaction 
amplification of this region showed that the sequenced cDNAs were produced 
from minor RNA species, an indication of possible genetic polymorphism in this 
region of the viral genome. The mutated messenger RNA 4 contains two ORFs: 
(1) ORF4a consisting of 132 nucleotides which potentially encodes a 44-amino acid 
polypeptide of 4653 Da; this coding sequence is preceded by a consensus transcrip- 
tional initiation sequence, CUAAACU, similar to the ones found upstream of the 
N and M genes; (2) ORF4b of 249 nucleotides potentially encoding an 83-amino 
acid basic and leucine-rich polypeptide of 9550 Da. On the other hand, mRNA 5 
contains one single ORF of 231 nucleotides which could encode a 77-amino acid 
basic and leucine-rich polypeptide of 9046 Da. This putative protein presents a 
significant degree of amino acid homology (33%) with its counterpart found in 
transmissible gastroenteritis coronavirus (TGEV). The proteins in the two differ- 
ent viruses exhibit similar molecular weights and are extremely hydrophobic. 


Correspondence to: P.J. Talbot, Centre de Recherche en Virologie, Institut Armand-Frappier, 531 boul. 
des Prairies, Laval, Qué., Canada H7N 4Z3. 
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Interestingly, a sequence homology of five amino acids was found between the 
protein encoded by ORF4b of HCV-229E and an immunologically important 
region of human myelin basic protein. 


Coronavirus; Human; 229E; Myelin basic protein; Polymorphism; mRNA 4; mRNA 
5; Nucleotide sequence 


Introduction 


The interest in the molecular biology of human coronaviruses (HCV) has 
increased considerably in the last few years. This observation is not surprising since 
they are important human pathogens responsible for up to 25% of common colds 
(McIntosh et al., 1974; Wege et al., 1982), some gastrointestinal infections (Resta 
et al., 1985), -and possibly neurological disorders such as multiple sclerosis (MS) or 
Parkinson’s disease (Fishman et al., 1985). Their potential involvement in multiple 
sclerosis was suggested by the observation of coronavirus-like particles in the brain 
of one MS patient (Tanaka et al., 1976), the isolation of coronaviruses from two 
MS brain tissues passaged in mice (Burks et al., 1980), the detection of intrathecal 
antibodies to HCV-OC43 and HCV-229E in MS patients (Salmi et al., 1982), and 
the preferential detection of coronavirus RNA in central nervous system (CNS) 
tissue from MS patients (Murray et al., 1990). Moreover, several excellent indica- 
tions suggest that MS could be the consequence of a virus-induced autoimmune 
disease. One of the possible mechanisms involved could be a molecular mimicry 
resulting from a sequence homology between a viral protein and myelin basic 
protein (Watanabe et al., 1983; Jahnke et al., 1985; Oldstone, 1987). However, 
since appropriate diagnostic tools have been lacking, the association of human 
coronaviruses with neurological disorders has not yet been confirmed. With this 
objective in mind, we have initiated studies on the molecular biology of the 
prototype 229E strain of HCV (HCV-229E). 

This virus possesses a single-stranded, positive-sense RNA genome with a 
molecular weight of approximately 6 x 10° (Hierholzer et al., 1981). Six subge- 
nomic RNA species are synthesized in infected cells and appear to have lower 
molecular weights than their counterparts from murine hepatitis virus (MHV) 
(Weiss and Leibowitz, 1981). Northern blot analysis has confirmed that, like other 
coronaviruses, they constitute a nested set of 3’-coterminal mRNA species 
(Schreiber et al., 1989) of which presumably only the 5’-unique regions are 
translated (reviewed in Spaan ct al., 1988). At least four polypeptides have been 
found in purified HCV-229E virions: 160- to 200-kDa and 88- to 105-kDa glycopro- 
teins which may be analogous to the spike glycoprotein (S) of MHV (Sturman et 
al., 1985); a 47- to 53-kDa polypeptide corresponding to the nucleocapsid protein 
(N), and a 17- to 26-kDa membrane protein (M) which is found in both glycosy- 
lated and non-glycosylated forms (Hierholzer, 1976; Macnaughton, 1980; Schmidt 
and Kenny, 1982; Arpin and Talbot, 1990). Another author also reported glycopro- 
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teins of 31 and 65 kDa (Hierholzer, 1976). Non-structural proteins have not been 
described, although in vitro translation of viral mRNAs yielded potential non- 
structural polypeptides of 42, 28.5 and 14 kDa (Jouvenne et al., 1990), which could 
also be observed in infected cells (Talbot et al., unpublished results). 

The nucleotide sequence of the genes encoding the N and M proteins of 
HCV-229E have previously been reported (Schreiber et al., 1989; Raabe and 
Siddell, 1989a; Jouvenne et al., 1990). Moreover, the nucleotide sequence of the 
unique regions of mRNAs 4 and 5 has been determined in one other laboratory 
(Raabe and Siddell, 1989b). According to the latter report, mRNA 4 contains one 
ORF of 399 nucleotides, while mRNA 5 contains two ORFs (ORFSA and ORF5B) 
of 264 and 231 nucleotides, respectively. However, these authors later corrected 
these mRNA assignments and found that mRNA 4, not 5, contained two ORFs: 4a 
and 4b (Raabe et al., 1990). In the present study, we report nucleotide scqucnce 
data from mRNAs 4 and 5 of HCV-229E, and observe a major difference with 
previously published data. The predicted amino acid sequences of the encoded 
polypeptides are compared with sequences known for other coronaviruses, as well 
as that of human myelin basic protein. Based on polymerase chain reaction 
analysis, we have attempted to understand the divergence of the data obtained by 
our two groups. We show evidence for the presence of a minor MRNA 4 species 
containing a large deletion and two smaller ORFs. 


Materials and Methods 
Cells and virus 


The human embryonic lung cell line L132 (Davis and Bolin, 1960) and the 
HCV-229E inoculum were obtained from the American Type Culture Collection 
(Rockville, MD). Cells were grown at 37°C in Earle’s minimal essential medium: 
Hank’s M199 (1:1, v/v) supplemented with 10% (v/v) fetal bovine serum (FBS), 
0.13% (w/v) sodium bicarbonate and 50 wg/mli Gentamycin (Gibco Canada, 
Burlington, Ont., Canada). The virus was plaque-purified twice and quantitated by 
plaque assay as previously described (Daniel and Talbot, 1987) except that plaques 
were revealed after 7 days. Three passages on L132 cells at a MOI of 0.001 were 
performed to yield a viral stock with a titer of 7 x 10° PFU/ml. Viral infections 
were performed at 33°C in medium which contained FBS reduced to a level of 2% 


(v/v). 
Preparation of intracellular RNA 


In order to optimize the yield of viral mRNA, we established the kinetics for 
synthesis of HCV-229E intracellular RNA in L132 cells and found a peak of 
[*H]uridine incorporation at 20 h post-infection at an MOI of 0.001. Thus, 
intracellular RNA from uninfected cells or cells infected 20 h previously with 
HCV-229E was extracted according to Favaloro et al. (1980). Briefly, cells were 
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lysed with a Dounce homogenizer, the cytoplasmic phase was isolated and depro- 
teinized with 200 ug/ml proteinase K and the preparations were treated with 
DNase I (0.1 wg /mg of nucleic acid). 


cDNA synthesis and cloning 


cDNA synthesis was carried out using a CDNA synthesis kit (Pharmacia, Dorval, 
Canada) and a synthetic oligonucleotide 5‘’TGGTACAATGTCACCCGTAC:3’, 
complementary to nucleotides 187 to 207 at the 5’-end of the M gene (Jouvenne et 
al., 1990), was used as primer. EcoRI adapters were added to blunt-ended, 
double-stranded cDNAs, followed by ligation to EcoRI-cut, dephosphorylated 
pBluescript II vector (Stratagene, La Jolla, CA). E. coli XL-1 transformants 
containing HCV-229E cDNAs were identified by colony hybridization (Grunstein 
and Hogness, 1975) with the 5’-end-radiolabeled oligonucleotide. 


cDNA sequencing and sequence analysis 


Stepwise unidirectional deletions at both ends of the largest cDNA clone were 
created with exonuclease III, mung bean nuclease and deoxythionucleotide deriva- 
tives (Stratagene). The sequencing of both strands was performed by the plasmid 
sequencing technique (Hattori and Sakaki, 1986), with T7 DNA polymerase 
(Pharmacia). In order to confirm the nucleotide sequence, nine other clones of 
decreasing sizes were partially sequenced at their 5’-ends. Therefore, each nu- 
cleotide in the reported sequence is the result of three separate sequencing 
reactions. Sequence analyses were performed on an Apple Macintosh Plus com- 
puter with the MacGene Plus program (Applied Genetic Technology Inc., Fairview 
Park, OH), except for potential phosphorylation sites, which were identified with 
the PC/GENE program (Intelligenetics, Inc., Mountain View, CA). Potential 
leucine zippers and N-glycosylation sites were identified manually and confirmed 
with the PC/GENE program. The analysis of the RNA secondary structure was 
performed with the Fold prediction program (Zuker and Stiegler, 1981) contained 
in the Sequence Analysis Software Package of the University of Wisconsin Genet- 
ics Computer Group (Devereux et al., 1984), accessed through the CAN/SND 
Molecular Biology Database System (Ottawa, Canada). 


Polymerase chain reaction and DNA sequencing of the amplified product 


The polymerase chain reaction was performed by a modification of the original 
procedure (Saiki et al., 1988), which will be described elsewhere. Oligonucleotides 
used for amplification were synthesized on an Applied Biosystems synthesizer, 
with the following sequences: 5’-CTATTCCAACAGCTGGGTGTTCAC-3’ (anti- 
sense; Fig. 1, bases 396-419), 5’-AAGATCACACCGTGGCAGAGCTGC-3’ 
(sense; Fig. 1, bases 238-261). The amplified DNA fragment was ligated into the 
M13 mp18 vector and sequenced by the dideoxy chain termination method (Sanger 
et al., 1977), using Sequenase™ (United States Biochemical Corp., Cleveland, 
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5t~ TGT GAA TCA ACT AAA CTT CCT TAT TAC GAC GTT GAA AAG ATC CAC ATA CAG TA 53 
c E s rT K L i XY Y D v E K I H I Q * 


ORF4a ATG GCT CTA GGT TTG TTC ACA TTG CAA CTT GTG TCT GCT GIT AAT CAA TCG CTT AGC AAT 113 
M A L G L F Le L Q L v Ss A v N Q iS) L 8 N 20 


GCG AAA GTT AGT GCT GAA GTT TCA CGA CAG GTT ATC CAA GAC GTG AAA GAT GGC ACT GTT 173 
A K Vv s A E v s R Q v I Q D Vv K D G T v 40 


TTC TCA ACT TGC TAG CGTATACACTA 199 


a 44 
pope a AB As OX TR * [Diva saci Bok aoe Decne awed SA Dea eate eat eds 


ORF4b ATG AGC CTC TTT CTT GTS TAT TTT GCT TTA TTT AAA GCA_AGA TCA CAC CGT SGC AGA Ger 259 
M $ L FE Vv v x F A L F K A R Ss H R 6 R A 20 


GCT CTT ATA GTG TTT AAa ATT CTA TCT TAT CTC TCA ACT AAC GAC TTG TAC GIT GCT CTT 319 


A L I Vv 1 K Ir L 5 ¥ L s T N BD L YX Vv A L 40 
A FO? 
AGA GGA CGT ATT GAT aAA GAC CTC AGC CTT TCT AGA AAG GIT GAG TTA TAT AAC GGT GAA 379 
R S R I D K D L s L s R K Vv EF I. YY: N GQ E 60 
° 
e 
TGT GTA TAC TTG TTT TGT GAA CAC CCA GCT GIT GGA ATA GTC AAC ACA GAT TTC_AAA TTA 439 
Cc Vv YX L F c E H P A Vv G Ir v N T D FE K Lu 80 
GAA ATC CAC TAA g 452 
BE I #8 * 83 


ORFS5 ATG TTC CTT aAG CTA GIG GAT GAT CAT GCT TTG GIT GIT AAT GTA CTA CTC TGG IGT GTG S12 
M F L K L v D oD # A L v Vv N Vv L L W c Vv 20 


GTG CTT ATA GTG ATA CTA CTA GTG TGT ATT ACA ATA ATT AAA CTA ACT AAG CTT TGT TTC 572 
v L a8 v I L L Vv c I T I I K L aT: K L c F 40 


ACT TGC CAT ATG TTT TGT ACT AGA ACA ATT TAT GGC CCC ATT AAA AAT GTG TAC CAC ATT 632 
T c H M F Cc T R T I Y G P I K N v YX H I 60 
N v 


TAC CAA TCA TAT ATG CAC ATA GAC CCT TTC CCT AAA CGA GTT ATT GAT CTC TAA ACTAAAC 693 


Y @ S Y¥ M H TIT D P F P K R VT DL * 7 
F 
mM GACA ATG - 3? 700 
M 


Fig. 1. Nucleotide sequence of the unique regions of mRNAs 4 and 5, as well as the predicted amino 
sequences of the encuded pulypeptides. The intergenic sequences are doubly underlined and termina- 
tion codons indicated (*). The potential N-glycosylation (e) and phosphorylation (°) sites are also 
indicated, The triangle shows the site of insertion of the additional 259 nucleotides reported by Raabe 
and Siddell (1989b) and the differences in the amino acids from mRNAs 4 and 5 published by these 
authors appear on the bottom row. The dots indicate that amino acid Leu-49 is followed by Met-1 in 
the latter sequence. The right and left-pointing arrows indicate the region selected for PCR amplifica- 
tion, using antisense ( — ) and sense ( — ) oligonucleotides corresponding to the underlined sequences. 


OH) and [*>S]dATP (Amersham Canada, Oakville, Ont., Canada). In some experi- 
ments, 2 «1 (6.6 pmol) of [a-** P]dCTP (spec. act. 3000 Ci/mmol; ICN Biomedicals 
Canada Ltd., Mississauga, Ont., Canada) was used in the amplification reaction, 
the amplified products were separated on agarose gels, which were then washed 
twice in water for 15 min each time and treated with 10% (w/v) trichloroacetic 
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acid for 15 min at room temperature and washed another two times in water 
before exposure to X-ray film. 


Results 


A cDNA library was prepared from HCV-229E intracellular RNA, using a 
specific oligonucleotide as primer. One clone, designated J22, contained a 1.7 kb 
insert that hybridized to viral mRNAs 1 through 6 in Northern blots (data not 
shown). The size of the RNAs, as determined with DNA markers, was similar to 
the ones obtained by Weiss and Leibowitz (1981). The J22 clone was selected for 
sequencing following creation of unidirectional deletions at both ends of the viral 
insert. Furthermore, the 5’-ends of nine other cDNA clones ranging in size from 
0.4 to 1.5 kb were sequenced. The nucleotide sequence of the unique regions of 
mRNAs 4 and 5, as well as the predicted amino acid sequence of the encoded 
polypeptides are presented in Fig. 1. Northern blot analysis was used to confirm 
the proposed mRNA assignments (data not shown). 

As shown in Fig. 1, messenger RNA 4 contains two ORFs: ORF4a and ORF4b. 
ORF4a extends from base 54 through base 185 and encodes a putative 44-amino 
acid polypeptide of 4653 Da. This ORF4a is preceded by a transcriptional 
initiation sequence, CUAAACU, which is located inside the 3’-end of the S gene. 
This sequence is similar to the consensus intergenic sequence found upstream of 
the N and M genes (Table 1). There is one potential N-glycosylation site in this 
predicted protein (Asn-15), as well as one potential phosphorylation site (casein 


TABLE 1 
Characteristics of the HCV-229E 3’-open reading frames 


RNA _ Size? Intergenic Distance from Adjacent Predicted 
species sequence the initiation ORF polypeptide 
mRNA? Predicted ORF codon * Size * Molecular 
unique weight 
region 
4 3700 700 132. CUAAACU 36 ORF4a 44 4,653 
(133) 4 (15,300) @ 
249 = absent ORF4b 83 9,550 
(88) 4 (10,200) 4 
5 3000 400 231 UCAAAU 15 ORFS 77 9,046 
6 2600 800 675© UCUAAACU*S 8 M 225© 25,822 © 
7 1800 1800 1167 UCUAAACU! 10 N 389 43,366! 
2 In bases. 


> Determined from Northern blots. 

© Number of amino acids. 

4 Data from Raabe and Siddell, 1989b; with mRNA assignment from Raabe et al., 1990. 
© Data from Jouvenne et al., 1990. 

* Data from Schreiber et al., 1989. 
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Fig. 2. Nucleotide sequence analysis from two regions of a sequencing gel corresponding to the 

termination codon of ORF4a (asterisk in A) and the site of an apparent deletion from the published 

sequence (arrows in B). The HCV-229E sequences shown correspond to nucleotides 171 to 197, located 
at the 3’-end of ORF4a (A), and 272 to 293, located within ORF 4b (B). 


kinase II site: Ser-42). ORF4b extends from base 200 through base 448 and 
encodes a putative 83-amino acid polypeptide with a calculated molecular mass of 
9550 Da. There are three potential phosphorylation sites in this predicted protein 
(protein kinase C sites: Ser-15, Ser-51; casein kinase II site: Ser-32). The ORF4b 
protein contains a high proportion of leucine and isoleucine (20%) as well as basic 
(16%) residues. No consensus intergenic sequence was found upstream of this 
ORF. A notable difference with a previously published sequence from this region 
of the genome (Raabe and Siddell, 1989b) is the absence in our sequence of 259 
nucleotides, which code for an additional 56 and 33 amino acids in their ORF4a 
and ORF4b (Raabe et al., 1990), respectively. The site of this apparent deletion is 
indicated by a triangle in Fig. 1. Furthermore, a termination codon in our sequence 
(bases 186-188) implies that five amino acids in the published ORF4 are missing 
from our sequence, in the region between ORFs 4a and 4b. The actual sequencing 
gel data from these two regions of apparent divergence is shown in Fig. 2: panel A 


132 


564 > 


125 > 


4-h exposure 16-h exposure 


Fig. 3. Agarose gel analysis of the radiolabeled amplification products of mRNA 4. Lane 1: L132 cells 

used to propagate HCV-229E; lane 2: HCV-229E. HindIII digested lambda DNA was used as the 

molecular size marker (M). The numbers on the right indicate the lengths of the amplified fragments 

and the numbers on the left indicate the relevant molecular size markers. The trichloroacetic 

acid-treated agarose gel was exposed to X-ray film for either four (left panel) or sixteen (right panel) 
hours. 


shows the region of the ORF4a termination codon and panel B the region of the 
deletion. 

In order to investigate further the significance of the apparent 259-nucleotide 
deletion in mRNA 4, we amplified a portion of this RNA by using specific 
oligonucleotides flanking the target sequences (indicated in Fig. 1) and the 
polymerase chain reaction (PCR). As shown in Fig. 3, a major 441 bp band 
corresponding to the size predicted from the sequence published by Raabe and 
Siddell (1989b) was obtained. However, a 182 bp signal was also faintly visible 
when a radioactive nucleotide was included in the PCR reaction. On the basis of 
our cDNA sequence, this smaller fragment may correspond to an mRNA lacking 
the 259 nucleotides, which was used as template for the two sequenced cDNA 
clones. Sequencing of the major 442 bp band confirmed the published sequence of 
this region, with the exception of a T to G substitution at position 328 (Raabe and 
Siddell, 1989b), which results in the replacement of a tyrosine residue by an 
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HCV-229E (ORFS) M~FLK-L--VDDHALVVNVLLW-CVVLIVILL-VC-ITIIKLTKLCFTCH 
TGEV Purdue! MTEPRALTVIDDNGMYINIIFWFLLIIILILLS IALLNLIKLCMVC--CN 50 
TGEV Miller? s 


MFCTRT-IYGPIKNVYHIYQSYMHIDPF-PKRVID~- 
L--GRIVLIVPAQHAYDAYKNFMRIKAYNPDGALLA 86 
v v 


HCV-229E 


TGEV Purdue! 


Fig. 4. (A) Comparison of the predicted amino acid sequences of mRNA 5 (ORFS) from HCV-229E 
(top row), TGEV Purdue (! Rasschaert et al., 1987; middle row) and TGEV Miller (7 Wesley et al., 
1989; bottom row) aligned for maximum homology. Regions common to the two proteins are underlined 
and differences with TGEV Miller are indicated on the bottom row. (B) Hydropathicity profiles of the 
protein encoded by mRNA 5 of HCV-229E and TGEV Purdue determined according to Kyte and 
Doolittle (1982). The analysis was performed on an Apple Macintosh Plus computer with the MacGene 
Plus program (Applied Genetic Technology Inc., Fairview Park, OH). Each point is the mean 
hydropathicity of a span of seven residues. Peaks extending upwards correspond to hydrophobic regions 
and peaks extending downwards to hydrophilic areas. 


aspartic acid. Unfortunately, we were unable to recover sufficient amounts of the 
minor species to confirm that its sequence does correspond to the data obtained 
from cDNAs and presented in Fig. 1. 

Messenger RNA 5 contains one single ORF of 231 nucleotides which extends 
from base 453 through base 683 and encodes a putative 77-amino acid polypeptide 
of 9046 Da containing a high proportion of leucine and isoleucine residues (27%). 
ORF5 is preceded by the sequence UCAAAU, which resembles the consensus 
intergenic sequence (Shieh et al., 1989) and is located at the 3’-end of ORF4b. The 
predicted amino acid sequence from ORFS is identical to that of ORF5B (later 
renamed ORF5) from Raabe and Siddell (1989b) except that four amino acids, one 
of which is part of a single potential N-glycosylation site (residue 47), are different 
in our sequence. Fig. 4A presents a comparison of the predicted amino acid 
sequences of the mRNA 5 (ORF5) between HCV-229E, TGEV Purdue (Ras- 
schaert et al., 1987) and TGEV Miller (Wesley et al., 1989). The ORF5 of 
HCV-229E exhibits 32.9% homology with TGEV Purdue and 31.7% homology 
with TGEV Miller. This predicted protein of HCV-229E has a similar molecular 
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<- MS epitope -> | | <- MS epitope -> 


| encephalitogenic site | | <+  encephalitogenic site -> | 
LOU/M rats 


encephalitogenic | 
site guinea pigs 


| <- Ms immunodominant site -> | 
MBP 82-DENPVVHFFKNIVTPRTPPPSQGKGRG LSLSR FSWGAEGQKPGFGYGGR-130 


229E 21-ALIVFKILSYLSTNDLYVALRGRIDKD LSLSR KVELYNGECVYLFCEHPA-70 


Fig. 5. Homology between myelin basic protein (MBP) and ORF4b of HCV-229E at the level of a 

five-amino acid sequence: LSLSR. The MBP immunodominant site recognized by T cells from MS 

patients, two epitopes recognized by T-cell clones established from MS patients, and sites demonstrat- 
ing encephalitogenic potential in experimental animals are indicated. 


weight to the one from TGEV (9,241; Rasschaert et al., 1987), is relatively basic 
(14%) and also exhibits an extremely hydrophobic profile (Fig. 4B). 

No significant homologies were found between our predicted amino acid 
sequences of mRNAs 4 and 5 from HCV-229E and those derived from mRNA 4 of 
TGEV Purdue (previously designated 3: Rasschaert et al., 1987), or TGEV Miller 
(Wesley et al., 1989), mRNA 3 from IBV M41 (previously designated D: Niesters 
et al., 1986; new nomenclature from: Cavanagh et al., 1990), mRNA 5 from IBV 
Beaudette (previously designated B: Boursnell and Brown, 1984), mRNAs 4 and 5 
from MHV-JHM (Skinner and Siddell, 1985; Skinner et al., 1985), or mRNA 5 
from MHV-A59 (Budzilowicz and Weiss, 1987). 

We have also compared the HCV-229E sequences with the human myelin basic 
protein (MBP; Roth et al., 1987) and found a five-amino acid homology between 
the protein encoded by ORF4b and MBP. This sequence is the following: LSLSR 
(residues 48 to 52 for HCV-229E and residues 109 to 113 for human MBP; Figs. 1 
and 5). This sequence belongs to exon 5 of the human MBP gene (Roth et al., 
1987) and is conserved among bovine, chimpanzee, guinea pig, murine, porcine, 
rabbit and rat MBPs (Martenson, 1983). As shown in Fig. 5, the LSLSR sequence 
is situated in a biologically relevant area of human MBP. It is contained within a 
site shown to be encephalitogenic for LOU/M rats (residues 109-128: Hashim et 
al., 1991), it overlaps by two amino acids an encephalitogenic site for guinea pigs 
(residues 112-124: Carnegie, 1971) and it is situated two amino acids downstream 
from an encephalitogenic site for SJL mice (residues 94-107: Fritz and McFarlin, 
1989). Moreover, it is located eight amino acids downstream from an immunodom- 
inant MBP epitope recognized by the T cells of multiple sclerosis patients 
(residues 82-100: Ota et al., 1990) and is contained within another sitc recognized 
by some T cell clones established from MS patients (residues 108-148: Jingwu et 
al., 1990), as well as four amino acids downstream from another such epitope 
(residues 86-105: Richert et al., 1989). It is also present in the putative protein 
encoded by ORFS5A (later renamed ORF4b) described by Raabe and Siddell 
(1989b). The LSLSR homology region was searched among the genes encoding 
non-structural proteins (except for the putative RNA polymerase) from the other 
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coronaviruses. The best homology found was the sequence LSLR belonging to 
ORF X2b of TGEV Purdue (Rasschaert et al., 1987), ORFB of TGEV Miller 
(Wesley et al., 1989) and ORF2 of TGEV FS772/70 (Britton et al., 1989), which 
correspond to ORF4b of HCV-229E at the level of the structural organization of 
the genome. 


Discussion 


A cDNA library was constructed from viral RNA extracted at the optimum time 
after infection of cultured cells, using an oligonucleotide primer complementary to 
a 5’-region of the previously published sequence of the M gene (Jouvenne et al., 
1990). 

The nucleotide sequences of the genes located between the S and M genes were 
obtained (Fig. 1). The apparently minor species (Fig. 3) of messenger RNA 4 
contains two non-overlapping ORFs and messenger RNA 5 contains only one 
ORF. Surprisingly, 259 nucleotides from the sequence reported by Raabe and 
Siddell (1989b) are absent from our sequence, which was determined from two 
cDNA clones. The absence of these nucleotides in our sequence and the mRNA 
assignments suggested by our Northern blot analysis, which are consistent with 
data reported by Raabe et al. (1990), have major implications for the structural 
organization of this region of the genome. We observe two ORFs in mRNA 4 and 
only one ORF in mRNA 5, an organization analogous to TGEV Purdue (Ras- 
schaert et al., 1987). The initial report by Raabe and Siddell (1989b) of one ORF 
in mRNA 4 and two ORFs in mRNA 5 is similar to what has been reported for 
MHV (Skinner and Siddell, 1985; Skinner et al., 1985; Budzilowicz and Weiss, 
1987), Our observation of multiple ORFs on mRNA 4 of HCV-229E confirms a 
recent report (Raabe et al., 1990), and has also been reported for IBV, where 
three non-overlapping ORFs were described (Boursnell et al., 1985; Niesters et al., 
1986). The unique ORF reported here for HCV-229E confirms the revised 
assignment reported by Raabe et al. (1990), and is also seen in TGEV (Rasschaert 
et al., 1987; Wesley et al., 1989). 

Surprisingly, the 5’-end ORF of mRNA 4 of HCV-229E (ORF4a in our 
sequence) is shorter than the ORF4 (later renamed ORF4a) reported by Raabe 
and Siddell (1989b). The latter ORF terminates in the observed additional se- 
quence within which their ORF5A (later renamed ORF4b) initiates. The 3'-end of 
their ORF5A (ORF 4b) is found in our ORF4b, of which the mRNA 4 assignment 
was confirmed by Northern blot analysis. These missing nucleotides in our se- 
quence led us to predict the existence of two smaller proteins: 4653 Da for ORF4a 
and 9550 Da for ORF4b. This observation differs from the previous report, where 
ORF4 (ORF4a) and ORF5A (ORF4b) specify 15300 Da and 10200 Da polypep- 
tides, respectively (Table 1). The two non-overlapping ORFs found in the unique 
region of mRNA 4 of TGEV Purdue encode putative 7.7 and 18.7 kDa polypep- 
tides (Rasschaert et al., 1987) and these two ORFs are apparently present as two 
distinct RNA species in TGEV Miller, with the second ORF predicting a larger 
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27.7 kDa protein (Wesley et al., 1989). On the other hand, the protein encoded by 
ORFS is very similar to the previously reported sequence (Raabe and Siddell, 
1989b), with only four amino acids changes, one of which involves the absence of a 
potential N-glycosylation site in our sequence (amino acid 47). 

The only significant homology between the predicted amino acid sequences 
encoded by mRNAs 4 and 5 of HCV-229E and other coronaviruses was found at 
the level of ORF5 which encodes a hydrophobic, basic and leucine-rich 9.0 kDa 
protein, which shows 32 to 33% homology with the 9.2 kDa protein encoded by 
mRNA 5 of the Miller and Purdue strains of TGEV, respectively (Wesley et al., 
1989; Rasschaert et al., 1987) (Fig. 4). Despite a large proportion of leucine 
residues (14%) in the putative proteins encoded by mRNAs 4 and 5, no sequence 
motif consistent with a DNA-binding leucine zipper was identified in these regions 
(Abel and Maniatis, 1989). The significance of the potential phosphorylation sites 
identified in ORF4a (one site) and ORF4b (three sites) remains to be determined 
but could be involved in the functions of these potential non-structural proteins. 

The extensive structural differences in the organization of HCV-229E mRNAs 4 
and 5 deduced by a comparison of the sequence presented here and the one 
reported by Raabe and Siddell (1989b) suggest that the gene products of mRNA 4 
of HCV-229E are not essential for virus replication in cell culture. Nevertheless, it 
remains possible that their functions are required for in vivo replication and 
pathogenesis. Our results extend to the human coronavirus previous observations 
on the non-essential nature for virus replication in transformed cells of non-struct- 
ural proteins ns2, ns4 and nsSa and the HE structural protein of murine hepatitis 
virus (Schwarz et al., 1990; Yokomori and Lai, 1991; Yokomori et al., 1991). These 
observations are in contrast to the essential nature of non-structural proteins of 
other viruses, for example the ns1 and ns2 proteins of parvovirus H-1, which are 
required for viral DNA replication and efficient viral protein synthesis, respectively 
(Rhode, 1989; Li and Rhode, 1991). Interestingly, the putative non-structural 
protein encoded by ORF4b in TGEV was reported to have a predicted size of 
either 18.7 or 27.7 kDa, depending on the viral strain studied (Rasschaert et al., 
1987; Wesley et al., 1989). Also, BCV mRNA 4 was shown to contain two open 
reading frames coding for proteins of 4.9 and 4.8 kDa, which appear to have arisen 
by a single base deletion from a single ORF encoding a 11 kDa protein, with 
significant homology with its counterpart in MHV (Abraham et al., 1990). 

We attempted to localize putative nucleic acid binding regions within ORFs 4a, 
4b and 5: no sequence compatible with either zinc fingers (Evans and Hollenberg, 
1988) or leucine zippers were found, although this does not preclude their function 
in genome transcription and replication. 

Finally, it is interesting to note that a consensus transcriptional initiation 
sequence is found upstream of mRNAs 4 and 5 (data summarized in Table 1). This 
is consistent with data accumulated so far for all coronaviruses (except for Raabe 
and Siddell, 1989b; although revised in Raabe et al., 1990). 

Further studies are needed to ascertain the functions of the putative non-struct- 
ural proteins encoded by the unique regions of mRNAs 4 and 5. The accumulation 
of sequence data for human coronaviruses will allow sequence-specific amplifica- 
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tion of the genome of these viruses from pathological specimens, and the produc- 
tion of specific antibodies which could be used to assess the function of the 
non-structural proteins. 

Examination of the HCV-229E sequence did not reveal any features indicative 
of jumping of the RNA polymerase complex, such as a site of strong secondary 
structure or repeated sequences in the vicinity of the deletion site in ORF4b. 
Indeed, predicted RNA secondary structures were not more extensive in this 
region (results not shown). Therefore, our results are indicative of a possible 
genetic variability of a human coronavirus. Such genetic polymorphism is known to 
affect the pathogenesis of murine (Parker et al., 1989; Taguchi and Fleming, 1989) 
and porcine coronaviruses (Rasschaert et al., 1990). In the latter case, a few 
genomic deletions, including 672 nucleotides in the S gene, appeared to have 
modified viral tropism from the gastro-intestinal to the respiratory tract of infected 
pigs. Further studies are needed to confirm the genetic variability of human 
coronaviruses. Nevertheless, the findings reported in the present study, together 
with those reported for murine and porcine coronaviruses suggest that genomic 
deletions, as well as recombination, are a source of diversity in the coronavirus 
family. 

We have compared the protein encoded by ORF4b of HCV-229E with the 
human myelin basic protein and found a five-amino acid homology, the sequence 
of which is LSLSR (Fig. 5). Since it belongs to exon 5 of the human MBP gene, this 
sequence is sometimes spliced among some human and murine variants (Roth et 
al., 1987; Takahashi et al., 1985). Assuming that the twenty amino acids are equally 
represented, the probability of finding a homology of five residues between two 
different proteins is 1 in 20°, or 3.2 x 10°. Although this homologous sequence is 
short, it could be sufficient to constitute a common epitope between the protein 
encoded by ORF4b of HCV-229E and the human MBP (Oldstone, 1987). Indeed, 
the minimum size of both B and T epitopes has been reported to be five amino 
acids (Geysen et al., 1989; Reddehase et al., 1989). On the other hand, several 
excellent indications suggest that multiple sclerosis could be the consequence of an 
autoimmune disease induced by a virus and one of the possible pathogenic 
mechanisms could involve molecular mimicry resulting from a sequence homology 
between a viral protein and myelin basic protein (Watanabe et al., 1983; Jahnke et 
al., 1985). If we consider the hypothesis of an autoimmune disease, the search for 
sequence homologies then takes its full significance. 

The homologous sequence LSLSR overlaps by two amino acids an encephalito- 
genic site of human MBP (Carnegie, 1971). These two residues belong to a group 
of three amino acids which seem essential for the encephalitogenic potential 
(Lennon et al., 1970). Moreover, an encephalitogenic site for LOU/M rats 
encompasses this homology region (Hashim et al., 1991) and a similar site for SJL 
mice was reported slightly upstream (Fritz and McFarlin, 1989). Similar results 
were reported by Fujinami and Oldstone (1985), who have found a sequence 
homology between hepatitis B virus polymerase and the encephalitogenic site of 
rabbit MBP. These authors suggested that viral infection may trigger a neurologic 
disease resulting from a mechanism of molecular mimicry. Furthermore, the 
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LSLSR homologous sequence is located eight amino acids downstream from an 
immunodominant MBP epitope recognized by the T cells of multiple sclerosis 
patients (Ota et al., 1990). This immunodominant epitope may be encephalitogenic 
in some DR2* individuals. The biological importance of this MBP region was also 
suggested by Richert et al. (1989), who showed that nine out of forty MBP-specific 
human CD4* T cell clones recognized MBP residues 86 to 105. Also, Jingwu et al. 
(1990) reported that four out of 17 T cell clones established from MS patients 
recognized residues 108 to 148, a region which encompasses the homology region 
with HCV-229E. Finally, it is important to note that Allegretta et al. (1990) have 
isolated circulating T cells which recognize the MBP of multiple sclerosis patients, 
lending further support to the importance of such an autoimmune reaction in 
pathology. 

In order to verify the hypothesis that HCV-229E may induce an autoimmune 
disease, it could be interesting to use a synthetic peptide corresponding to the 
LSLSR sequence. This peptide may, on one hand serve in proliferation tests of T 
cells from MS patients, and on the other hand be injected into mice in order to 
examine its autoimmune potential. 
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